Topical Word Trigger Model for Keyphrase Extraction

نویسندگان

  • Zhiyuan Liu
  • Chen Liang
  • Maosong Sun
چکیده

Keyphrase extraction aims to find representative phrases for a document. Keyphrases are expected to cover main themes of a document. Meanwhile, keyphrases do not necessarily occur frequently in the document, which is known as the vocabulary gap between the words in a document and its keyphrases. In this paper, we propose Topical Word Trigger Model (TWTM) for keyphrase extraction. TWTM assumes the content and keyphrases of a document are talking about the same themes but written in different languages. Under the assumption, keyphrase extraction is modeled as a translation process from document content to keyphrases. Moreover, in order to better cover document themes, TWTM sets trigger probabilities to be topic-specific, and hence the trigger process can be influenced by the document themes. On one hand, TWTM uses latent topics to model document themes and takes the coverage of document themes into consideration; on the other hand, TWTM uses topic-specific word trigger to bridge the vocabulary gap between the words in document and keyphrases. Experiment results on real world dataset reveal that TWTM outperforms existing state-of-the-art methods under various evaluation metrics. TITLE AND ABSTRACT IN CHINESE æ^ÄuÌK >u .?1'… Ä '… Ä ‘3uy© ¥k“L5 1⁄2öáŠ"Ä '… A CX© ̇{K"Ӟ§'… ¿Ø˜1⁄23© ¥a„Ñy§ùÒ ́¤¢ © †'… m / ®õ 0 ̄K" ©Jј«ÄuÌK >u .£TWTM¤?1'… Ä "T .b © SNÚ'… ́3^ØÓ Šó£ãƒÓ {K"3ù‡b e§'… Ä ÒŒ± ï Ǒl© '… €ÈL§"Ǒ  /CX© {K§T . >u VÇÑ ́ÌKƒ' §l >uL§É © ÌK KǑ"˜¡§T .|^Û1Ì Ké© {K?1ï §l ò© ÌK CXÝÄ3S¶,˜¡§T .æ^ÌK ƒ' >uïáå © †'… xù"3ý¢êâþ ¢ (JL2§T .`u ®k'… Ä {" KEYWORDS: keyphrase extraction, latent topic model, word trigger model. KEYWORDS IN CHINESE: '… Ä ,Û1ÌK ., >u ..

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topical Word Importance for Fast Keyphrase Extraction

We propose an improvement on a state-of-the-art keyphrase extraction algorithm, Topical PageRank (TPR), incorporating topical information from topic models. While the original algorithm requires a random walk for each topic in the topic model being used, ours is independent of the topic model, computing but a single PageRank for each text regardless of the amount of topics in the model. This in...

متن کامل

Automatic Keyphrase Extraction via Topic Decomposition

Existing graph-based ranking methods for keyphrase extraction compute a single importance score for each word via a single random walk. Motivated by the fact that both documents and words can be represented by a mixture of semantic topics, we propose to decompose traditional random walk into multiple random walks specific to various topics. We thus build a Topical PageRank (TPR) on word graph t...

متن کامل

TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction

Keyphrase extraction is the task of identifying single or multi-word expressions that represent the main topics of a document. In this paper we present TopicRank, a graph-based keyphrase extraction method that relies on a topical representation of the document. Candidate keyphrases are clustered into topics and used as vertices in a complete graph. A graph-based ranking model is applied to assi...

متن کامل

Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming

We introduce a global inference model for keyphrase extraction that reduces overgeneration errors by weighting sets of keyphrase candidates according to their component words. Our model can be applied on top of any supervised or unsupervised word weighting function. Experimental results show a substantial improvement over commonly used word-based ranking approaches.

متن کامل

Unsupervised Keyphrase Extraction with Multipartite Graphs

We propose an unsupervised keyphrase extraction model that encodes topical information within a multipartite graph structure. Our model represents keyphrase candidates and topics in a single graph and exploits their mutually reinforcing relationship to improve candidate ranking. We further introduce a novel mechanism to incorporate keyphrase selection preferences into the model. Experiments con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012